The Bellman Equation in (2.7) is in an elementwise form. Since it is valid for every state, we can combine all these equations and write them concisely in a matrix- vector form, which will be frequently used to analyze the Bellman equation.
To derive the matrix- vector form, we first rewrite the Bellman equation in (2.7) as
denotes the mean of the immediate rewards,
is the probability of transitioning from to under policy .
Suppose that the states are indexed as with , where . For state , (2.8) can be written as
Let , , and with . Then, (2.9) can be written in the following matrix- vector form:
where is the unknown to be solved, and are known.
The matrix has some interesting properties. First, it is a nonnegative matrix, meaning that all its elements are equal to or greater than zero. This property is denoted as , where 0 denotes a zero matrix with appropriate dimensions. In this book, or represents an elementwise comparison operation. Second, is a stochastic matrix, meaning that the sum of the values in every row is equal to one. This property is denoted as , where has appropriate dimensions.
Consider the example shown in Figure 2.6. The matrix- vector form of the Bellman equation is
Substituting the specific values into the above equation gives
It can be seen that satisfies .
Figure 2.6: An example for demonstrating the matrix-vector form of the Bellman equation.